### Execute Shell Commands for Setup Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_offline.ipynb Shows example shell commands executed during the setup process, including cloning a git repository and installing Python packages using pip. ```shell > git clone --depth 1 https://github.com/top-papers/top-papers-graph.git /content/top-papers-graph > /usr/bin/python3 -m pip install -q ipywidgets pyyaml requests unidecode nbformat > /usr/bin/python3 -m pip install -q -e .[task3] ``` -------------------------------- ### Local Setup and Server Start Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/src/scireason/mcp/USAGE_RU.md Steps to set up a local Python environment and start the MCP server for testing. ```bash cd /home/karaluv/proga/top-papers-graph python -m venv .venv source .venv/bin/activate python -m pip install -U pip python -m pip install -e '.[mcp]' ``` ```bash PYTHONPATH=src python -m scireason.mcp.server ``` -------------------------------- ### Save Quick Setup Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_colab.ipynb Saves the current configuration for a quick setup. Use this before installing the repository. ```python print("Быстрый сетап сохранён. Теперь можно запускать ячейку установки репозитория.") ``` -------------------------------- ### Quick Setup UI for Repository Installation Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/task3_dual_local_models_blind_ab_guarded_fixed.ipynb This code initializes and displays a UI for quick data input before repository installation. It includes an accordion for different settings and buttons for applying or resetting. Use this for early data entry; verification is in a separate cell. ```python basic_box = W.VBox([ input_mode, W.HBox([task1_yaml_path, task1_yaml_upload]), W.HBox([processed_path, processed_upload]), W.HBox([commands_file_path, commands_upload]), query, identifiers_text, commands_text, ]) expert_box = W.VBox([expert_last, expert_first, expert_pat, domain_id, out_dir]) model_alpha_box = W.VBox([model_a_owner_label, model_a_vlm_backend, model_a_vlm_model_id, model_a_local_text_model]) model_beta_box = W.VBox([model_b_owner_label, model_b_vlm_backend, model_b_vlm_model_id, model_b_local_text_model]) flags_box = W.VBox([hf_offline, include_multimodal, run_vlm, raw_yaml_override]) accordion = W.Accordion(children=[ basic_box, W.HBox([expert_box, flags_box]), W.HBox([model_alpha_box, model_beta_box]), ]) accordion.set_title(0, 'Входные данные') accordion.set_title(1, 'Эксперт и флаги') accordion.set_title(2, 'Модели α / β') display(W.VBox([ W.HTML('
Быстрый сетап до установки репозитория
'\n '
Эта ячейка нужна только для раннего ввода данных. Проверка вынесена в следующую отдельную ячейку.
'), accordion, W.HBox([quick_apply_btn, quick_reset_btn]), quick_status_html, quick_summary_html, ])) _task3_quick_apply_to_globals(show_message=False) if TASK3_QUICK_INITIAL_YAML_ERROR: quick_status_html.value = _task3_quick_render_box( 'Есть проблема в сохранённом YAML override', lines=[TASK3_QUICK_INITIAL_YAML_ERROR], tone='warning', ) ``` -------------------------------- ### Install Dependencies and Setup Environment Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/utils/telegram_bot/README.md Navigate to the telegram_bot directory, create a virtual environment, activate it, install requirements, and copy the environment file. Remember to fill in your bot token in the .env file. ```bash cd telegram_bot python3 -m venv venv && source venv/bin/activate pip install -r requirements.txt cp .env.example .env # затем впишите в .env токен бота ``` -------------------------------- ### Run PostgreSQL Setup Script Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/third_party/top-papers-bot/README.md Executes the script to install and configure PostgreSQL, create a user, database, and subscriptions table. May require sudo privileges. ```bash sudo ./make_postgres.sh ``` -------------------------------- ### Install Project Dependencies Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/course/weeks/week01.md Use this script to install and configure project dependencies. It handles setup automatically. Available for both bash and PowerShell environments. ```shell ./scripts/bootstrap.sh ``` ```powershell ./scripts/bootstrap.ps1 ``` -------------------------------- ### Install Project Dependencies Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/docs/quickstart.md Sets up a virtual environment and installs the project with development dependencies. Use the Windows-specific command for activation if on that OS. ```bash python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install -e ".[dev]" ``` -------------------------------- ### Local Environment Setup with Pip Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/TUTORIAL_QWEN3VL_DATASPHERE_BUDGET_RU.md Commands to create a virtual environment, activate it, and install necessary Python packages (datasphere, pyyaml, huggingface_hub) for local development. ```bash python3 -m venv .venv source .venv/bin/activate python -m pip install -U pip python -m pip install -U datasphere pyyaml huggingface_hub datasphere version ``` -------------------------------- ### Install and Activate DataSphere CLI Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/experiments/vlm_finetuning/datasphere/TUTORIAL_FULL_EXPERIMENT_RU.md Activate the virtual environment and install or upgrade the DataSphere CLI. Verify the installation with the version command. ```bash source .venv/bin/activate python -m pip install -U datasphere datasphere version ``` -------------------------------- ### Apply Setup to Widgets Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/task3_dual_local_models_blind_ab_guarded_fixed.ipynb Applies a given setup configuration to the various widgets in the UI. If no setup is provided, it first refreshes the setup from global variables. This function is crucial for initializing or updating the UI based on configuration. ```python def _apply_setup_to_widgets(setup=None, *, announce=True): setup = setup or _refresh_setup_from_globals() input_mode.value = setup['input_mode'] trajectory_path.value = setup['task1_yaml_path'] processed_path.value = setup['processed_path'] commands_path.value = setup['commands_file_path'] query.value = setup['query'] identifiers_text.value = '\n'.join(setup['identifiers']) commands_text.value = setup['commands_text'] expert_last.value = setup['expert']['last_name'] expert_first.value = setup['expert']['first_name'] expert_pat.value = setup['expert']['patronymic'] domain_id.value = setup['domain_id'] out_dir.value = setup['out_dir'] search_limit.value = int(setup['search_limit']) top_papers.value = int(setup['top_papers']) top_hypotheses.value = int(setup['top_hypotheses']) candidate_top_k.value = int(setup['candidate_top_k']) top_pairs.value = int(setup['top_pairs']) annoy_n_trees.value = int(setup['annoy_n_trees']) annoy_top_k.value = int(setup['annoy_top_k']) include_multimodal.value = bool(setup['include_multimodal']) run_vlm.value = bool(setup['run_vlm']) edge_mode.value = setup['edge_mode'] link_prediction_backend.value = setup['link_prediction_backend'] model_a_owner_label.value = setup['model_a']['owner_label'] model_a_vlm_backend.value = setup['model_a']['vlm_backend'] model_a_vlm_model_id.value = setup['model_a']['vlm_model_id'] or TASK3_DEFAULT_LOCAL_VLM_MODEL ``` -------------------------------- ### Initial Setup and Imports Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_offline.ipynb Ensures the setup is validated before proceeding with imports and environment variable configurations. Sets up environment variables for asynchronous operations and timeouts. ```python if '_task3_require_validated_setup' not in globals(): raise RuntimeError('Сначала запустите ячейку быстрого сетапа и отдельную ячейку проверки.') _task3_require_validated_setup() ``` ```python import gc, io, json, os, sys, tempfile, subprocess, zipfile, shutil, traceback, inspect from pathlib import Path import importlib.util os.environ.setdefault('G4F_ASYNC_ENABLED', '1') os.environ.setdefault('G4F_ASYNC_MAX_CONCURRENCY', '3') os.environ.setdefault('G4F_ASYNC_RETRIES', '3') os.environ.setdefault('G4F_ASYNC_MAX_MODELS_PER_REQUEST', '3') os.environ.setdefault('LLM_REQUEST_TIMEOUT_SECONDS', '25') os.environ.setdefault('VLM_REQUEST_TIMEOUT_SECONDS', '45') ``` -------------------------------- ### Apply Quick Setup Configuration Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_offline.ipynb Applies the current values from interactive widgets to global variables, updating the quick setup summary and status messages. This function is triggered when the 'Apply Quick Setup' button is clicked. ```python def _task3_quick_on_apply(_): _task3_quick_apply_to_globals(show_message=True) quick_apply_btn.on_click(_task3_quick_on_apply) ``` -------------------------------- ### Apply Setup Configuration Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_colab.ipynb Callback function to apply the current setup configuration to the form widgets. It's triggered when the 'Apply Setup' button is clicked. ```python def _on_apply_setup(_): _apply_setup_to_widgets(_refresh_setup_from_globals(), announce=True) ``` -------------------------------- ### Apply Setup Configuration to Widgets Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_offline.ipynb Applies a predefined setup configuration to the notebook's widgets. This is useful for quickly populating form fields with known values. It also provides feedback on whether the setup was applied successfully and if there were any parsing errors. ```python model_a_local_text_model.value = setup['model_a']['local_text_model'] model_b_owner_label.value = setup['model_b']['owner_label'] model_b_vlm_backend.value = setup['model_b']['vlm_backend'] model_b_vlm_model_id.value = setup['model_b']['vlm_model_id'] model_b_local_text_model.value = setup['model_b']['local_text_model'] hf_offline.value = bool(setup['hf_offline']) create_offline_form.value = bool(setup['create_offline_form']) create_expert_bundle.value = bool(setup['create_expert_bundle']) auto_download_offline.value = bool(setup['auto_download_offline']) auto_download_bundle.value = bool(setup['auto_download_bundle']) auto_download_owner_key.value = bool(setup['auto_download_owner_key']) report = _validate_task3dual_snapshot(_collect_form_snapshot()) _set_validation_box(report, title='Проверка после применения быстрого сетапа') if announce: note_lines = ['Значения из верхней ячейки применены к форме.'] if TASK3_DUAL_SETUP_PARSE_ERROR: note_lines.append('YAML override не разобрался — использованы значения из словаря и дефолты.') _set_status_box('Форма синхронизирована', lines=note_lines, tone='neutral') ``` -------------------------------- ### Require Validated Setup Function Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_offline.ipynb Checks if the quick setup has been validated and is in a usable state. Raises a `Task3SetupBlocked` exception if the setup is dirty, not yet validated, failed validation, or has an outdated fingerprint. ```python def _task3_require_validated_setup(): if globals().get('TASK3_QUICK_SETUP_DIRTY'): raise Task3SetupBlocked( 'Запуск заблокирован: сетап изменён после последней проверки', [ 'Нажмите «Сохранить быстрый сетап».', 'Затем снова запустите ячейку «Проверка быстрого сетапа».', ], ) report = globals().get('TASK3_QUICK_VALIDATION_REPORT') if not report: raise Task3SetupBlocked( 'Запуск заблокирован: быстрый сетап ещё не проверен', [ 'Сначала запустите ячейку «Проверка быстрого сетапа».', ], ) if not globals().get('TASK3_QUICK_SETUP_VALIDATED_OK'): lines = list((report or {}).get('errors') or []) reason = globals().get('TASK3_QUICK_SETUP_BLOCK_REASON') if reason: lines = [reason] + lines if not lines: lines = ['Исправьте конфиг и заново запустите ячейку проверки.'] raise Task3SetupBlocked('Запуск заблокирован: быстрый сетап не прошёл проверку', lines[:8]) current_fp = str(globals().get('TASK3_QUICK_SETUP_CURRENT_FINGERPRINT') or '') validated_fp = str(globals().get('TASK3_QUICK_SETUP_VALIDATED_FINGERPRINT') or '') if current_fp and validated_fp and current_fp != validated_fp: raise Task3SetupBlocked( 'Запуск заблокирован: проверка устарела', [ 'Сохранённая конфигурация изменилась после последней проверки.', 'Снова запустите ячейку «Проверка быстрого сетапа».', ], ) return True ``` -------------------------------- ### Install and Import Libraries Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_colab.ipynb Installs necessary libraries and sets up environment variables for asynchronous operations and timeouts. It also configures mock providers if a smoke test is enabled. ```python import gc, io, json, os, sys, tempfile, subprocess, zipfile, shutil, traceback, inspect from pathlib import Path os.environ.setdefault('G4F_ASYNC_ENABLED', '1') os.environ.setdefault('G4F_ASYNC_MAX_CONCURRENCY', '3') os.environ.setdefault('G4F_ASYNC_RETRIES', '3') os.environ.setdefault('G4F_ASYNC_MAX_MODELS_PER_REQUEST', '3') os.environ.setdefault('LLM_REQUEST_TIMEOUT_SECONDS', '25') os.environ.setdefault('VLM_REQUEST_TIMEOUT_SECONDS', '45') if os.environ.get('TASK3_DUAL_NOTEBOOK_SMOKE') == '1': os.environ.setdefault('LLM_PROVIDER', 'mock') os.environ.setdefault('LLM_MODEL', 'mock') os.environ.setdefault('EMBED_PROVIDER', 'hash') os.environ.setdefault('MM_EMBED_BACKEND', 'none') os.environ.setdefault('VLM_BACKEND', 'none') os.environ.setdefault('HF_HUB_OFFLINE', '1') os.environ.setdefault('TRANSFORMERS_OFFLINE', '1') os.environ.setdefault('HF_DATASETS_OFFLINE', '1') ``` -------------------------------- ### Install Dependencies and Clone Repository Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/top_papers_graph_scidatapipe_hf_colab_from_csv_only_fixed_gdown_scope_fixed.ipynb Installs necessary system packages and Python libraries, then clones the specified top-papers-graph repository and installs it in editable mode. This sets up the environment for subsequent operations. ```python # @title import os import sys import shutil import subprocess from pathlib import Path def run(cmd, cwd=None): print("+ ", " ".join(cmd)) subprocess.run(cmd, check=True, cwd=cwd) run(["apt-get", "update", "-qq"]) run(["apt-get", "install", "-y", "-qq", "git", "p7zip-full", "unzip"]) run([sys.executable, "-m", "pip", "install", "-q", "-U", "pip", "setuptools", "wheel"]) run([sys.executable, "-m", "pip", "install", "-q", "-U", "gdown", "huggingface_hub", "pandas", "openpyxl", "PyYAML", "Unidecode", "requests"]) repo_dir = Path("/content/top-papers-graph") if repo_dir.exists(): shutil.rmtree(repo_dir) run(["git", "clone", "--depth", "1", "--branch", REPO_BRANCH, REPO_URL, str(repo_dir)]) run([sys.executable, "-m", "pip", "install", "-q", "-e", "."], cwd=str(repo_dir)) print("Репозиторий установлен:", repo_dir) ``` -------------------------------- ### Python Function to Apply Quick Setup to Globals Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/task3_dual_local_models_blind_ab_guarded_kaggle_offline.ipynb Applies a collected snapshot to global variables, merging it with default setup and handling YAML overrides. It also generates a fingerprint for the current setup. ```python def _task3_quick_apply_to_globals(show_message=True): snapshot = _task3_quick_collect_snapshot() merged = _task3_quick_deep_update(_task3_quick_default_setup(), snapshot) merged.pop('yaml_override_text', None) merged.pop('upload_notes', None) yaml_text = str(snapshot.get('yaml_override_text') or '').strip() yaml_payload, yaml_error = _task3_quick_parse_yaml(yaml_text) if yaml_payload: merged = _task3_quick_deep_update(merged, yaml_payload) fingerprint = _task3_quick_snapshot_fingerprint(snapshot) globals()['TASK3_DUAL_SETUP'] = merged globals()['TASK3_DUAL_SETUP_YAML'] = yaml_text globals()['TASK3_QUICK_LAST_SNAPSHOT'] = snapshot globals()['TASK3_QUICK_LAST_YAML_ERROR'] = yaml_error globals()['TASK3_QUICK_SETUP_CURRENT_FINGERPRINT'] = fingerprint globals()['TASK3_QUICK_SETUP_DIRTY'] = False globals()['TASK3_QUICK_SETUP_VALIDATED_OK'] = False globals()['TASK3_QUICK_SETUP_VALIDATED_FINGERPRINT'] = '' globals()['TASK3_QUICK_SETUP_VALIDATION_REQUIRED'] = True globals()['TASK3_QUICK_SETUP_BLOCK_REASON'] = 'После сохранения быстрый сетап нужно проверить отдельной ячейкой.' summary = { 'input_mode': merged['input_mode'], 'task1_yaml_path': merged['task1_yaml_path'], 'processed_path': merged['processed_path'], 'commands_file_path': merged['commands_file_path'], 'query': merged['query'], 'identifiers_count': len(merged['identifiers']), 'model_a': merged['model_a'], 'model_b': merged['model_b'], 'hf_offline': merged['hf_offline'], } ``` -------------------------------- ### Launch VLM Smoke Example Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/DATASPHERE_SMOKE_EVAL_IMAGES_COLUMN_FIX_RU.md This command sequence is used to launch the VLM smoke example after setting up the environment and project. ```bash cd top-papers-graph-main source .venv/bin/activate export DATASPHERE_PROJECT_ID='bt18pnosk97i8n24ddnv' datasphere project get --id "$DATASPHERE_PROJECT_ID" bash experiments/vlm_finetuning/datasphere/launch_examples.sh hf-smoke-managed ``` -------------------------------- ### Install smolagents with Agents Support Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/docs/smolagents.md Install the project with basic agent support. This is the minimum requirement for agent functionality. ```bash pip install -e ".[agents]" ``` -------------------------------- ### Apply Setup to Widgets Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_colab.ipynb Applies a given setup configuration to the various widgets on the page. Use this to pre-fill form fields based on a saved or default configuration. ```python def _apply_setup_to_widgets(setup=None, *, announce=True): setup = setup or _refresh_setup_from_globals() input_mode.value = setup['input_mode'] trajectory_path.value = setup['task1_yaml_path'] processed_path.value = setup['processed_path'] commands_path.value = setup['commands_file_path'] query.value = setup['query'] identifiers_text.value = '\n'.join(setup['identifiers']) commands_text.value = setup['commands_text'] expert_last.value = setup['expert']['last_name'] expert_first.value = setup['expert']['first_name'] expert_pat.value = setup['expert']['patronymic'] domain_id.value = setup['domain_id'] out_dir.value = setup['out_dir'] search_limit.value = int(setup['search_limit']) top_papers.value = int(setup['top_papers']) top_hypotheses.value = int(setup['top_hypotheses']) candidate_top_k.value = int(setup['candidate_top_k']) top_pairs.value = int(setup['top_pairs']) annoy_n_trees.value = int(setup['annoy_n_trees']) annoy_top_k.value = int(setup['annoy_top_k']) include_multimodal.value = bool(setup['include_multimodal']) run_vlm.value = bool(setup['run_vlm']) edge_mode.value = setup['edge_mode'] link_prediction_backend.value = setup['link_prediction_backend'] model_a_owner_label.value = setup['model_a']['owner_label'] model_a_vlm_backend.value = setup['model_a']['vlm_backend'] model_a_vlm_model_id.value = setup['model_a']['vlm_model_id'] or TASK3_DEFAULT_LOCAL_VLM_MODEL model_a_local_text_model.value = setup['model_a']['local_text_model'] model_b_owner_label.value = setup['model_b']['owner_label'] model_b_vlm_backend.value = setup['model_b']['vlm_backend'] model_b_vlm_model_id.value = setup['model_b']['vlm_model_id'] ``` -------------------------------- ### Key-Value Store for Form Fields Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/scripts/healthboard/frontend/output/singlepage.html The `xg` class provides a key-value store for form fields, using a specific format for keys to manage field data efficiently. It includes methods for setting, getting, updating, deleting, and mapping entries. ```javascript var bg="__@field_split__"; function yg(e){return e.map(function(e){return"".concat(y(e),":").concat(e)}).join(bg)} var xg=function(){function e(){ft(this,e),R(this,"kvs",new Map)}return mt(e,[ {key:"set",value:function(e,t){this.kvs.set(yg(e),t)}}, {key:"get",value:function(e){return this.kvs.get(yg(e))}}, {key:"update",value:function(e,t){var n=t(this.get(e));n?this.set(e,n):this.delete(e)}}, {key:"delete",value:function(e){this.kvs.delete(yg(e))}}, {key:"map",value:function(e){return l(this.kvs.entries()).map(function(t){var n=N(t,2),r=n[0],o=n[1],i=r.split(bg); return e({key:i.map(function(e){var t=N(e.match(/^(\[^:]*\*):(. *)$/),3),n=t[1],r=t[2];return"number"===n?Number(r):r}),value:o})})}}, {key:"toJSON",value:function(){var e={}; return this.map(function(t){var n=t.key,r=t.value; return e[n.join(".")] = r,null}),e}} ]),e} (); const wg=xg; ``` -------------------------------- ### Install and Update Python Dependencies Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/experiments/vlm_finetuning/datasphere/TUTORIAL_FULL_EXPERIMENT_RU.md Installs and updates necessary Python packages including datasphere, pyyaml, and huggingface_hub. This is for local environment setup. ```bash python3 -m venv .venv source .venv/bin/activate python -m pip install -U pip python -m pip install -U datasphere pyyaml huggingface_hub ``` -------------------------------- ### Run Course Demo Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/course/README.md Execute the bootstrap script and then run the demo for the project. This is the recommended starting point for the course. ```bash ./scripts/bootstrap.sh top-papers-graph demo-run ``` -------------------------------- ### Make PostgreSQL Script Executable Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/third_party/top-papers-bot/README.md Grants execute permissions to the PostgreSQL setup script. This script automates the installation and configuration of PostgreSQL. ```bash chmod +x make_postgres.sh ``` -------------------------------- ### Python: Collect Snapshot for Quick Setup Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_offline.ipynb Gathers current settings and file paths from various input widgets and global variables to create a snapshot of the setup. Handles file uploads and appends notes about saved files. ```python def _task3_quick_collect_snapshot(): yaml_upload_path = '' processed_upload_path = '' commands_upload_path = '' upload_notes = [] try: if task1_yaml_upload.value: yaml_upload_path = _task3_quick_save_upload(task1_yaml_upload.value, 'task1.yaml') upload_notes.append(f'Task1 YAML сохранён: {yaml_upload_path}') except Exception as e: upload_notes.append(f'Не удалось сохранить Task1 YAML upload: {type(e).__name__}: {e}') try: if processed_upload.value: processed_upload_path = _task3_quick_save_upload(processed_upload.value, 'processed_papers.zip') upload_notes.append(f'Processed ZIP сохранён: {processed_upload_path}') except Exception as e: upload_notes.append(f'Не удалось сохранить processed upload: {type(e).__name__}: {e}') try: if commands_upload.value: commands_upload_path = _task3_quick_save_upload(commands_upload.value, 'commands.yaml') upload_notes.append(f'Commands file сохранён: {commands_upload_path}') except Exception as e: upload_notes.append(f'Не удалось сохранить commands upload: {type(e).__name__}: {e}') return { 'input_mode': input_mode.value, 'task1_yaml_path': (yaml_upload_path or task1_yaml_path.value or '').strip(), 'processed_path': (processed_upload_path or processed_path.value or '').strip(), 'commands_file_path': (commands_upload_path or commands_file_path.value or '').strip(), 'query': query.value.strip(), 'identifiers': [line.strip() for line in identifiers_text.value.splitlines() if line.strip()], 'commands_text': commands_text.value.strip(), 'expert': { 'last_name': expert_last.value.strip(), 'first_name': expert_first.value.strip(), 'patronymic': expert_pat.value.strip() or '-', }, 'domain_id': domain_id.value.strip() or 'science', 'out_dir': out_dir.value.strip() or 'runs/task3_dual_local_blind_ab', 'hf_offline': bool(hf_offline.value), 'include_multimodal': bool(include_multimodal.value), 'run_vlm': bool(run_vlm.value), 'model_a': { 'owner_label': model_a_owner_label.value.strip() or 'base_local_model', 'vlm_backend': model_a_vlm_backend.value, 'vlm_model_id': model_a_vlm_model_id.value.strip(), 'local_text_model': model_a_local_text_model.value.strip(), }, 'model_b': { 'owner_label': model_b_owner_label.value.strip() or 'finetuned_local_model', 'vlm_backend': model_b_vlm_backend.value, 'vlm_model_id': model_b_vlm_model_id.value.strip(), 'local_text_model': model_b_local_text_model.value.strip(), }, 'yaml_override_text': raw_yaml_override.value, 'upload_notes': upload_notes, } ``` -------------------------------- ### Unpack and Navigate to Project Directory Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/TUTORIAL_QWEN3_VL_8B_DATASPHERE_SMOKE_FIX_RU.md Create a directory, change into it, unzip the provided archive, and navigate into the extracted project folder. This sets up the project files for use. ```bash mkdir -p ~/Documents/top-papers-graph-fixed cd ~/Documents/top-papers-graph-fixed unzip /path/to/top-papers-graph-main-datasphere-smoke-fix.zip cd top-papers-graph-main ``` -------------------------------- ### Environment Setup and Path Initialization Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_gdown_script_clone_models_fixed.ipynb Initializes workspace, output, and Hugging Face cache directories, and sets environment variables for timeouts, DPI, and VLM parameters. Ensures necessary directories are created. ```python import importlib import json import os import shutil import subprocess import sys import zipfile from pathlib import Path WORKSPACE_DIR = Path(CFG["workspace_dir"]).expanduser() OUTPUT_DIR = Path(CFG["output_dir"]).expanduser() HF_HOME = Path(str(CFG.get("hf_home") or "/kaggle/working/.hf")).expanduser() HF_HOME.mkdir(parents=True, exist_ok=True) os.environ.setdefault("HF_HOME", str(HF_HOME)) os.environ.setdefault("HF_HUB_CACHE", str(HF_HOME / "hub")) os.environ.setdefault("TRANSFORMERS_CACHE", str(HF_HOME / "hub")) os.environ.setdefault("LOCAL_VLM_REQUEST_TIMEOUT_SECONDS", str(CFG.get("local_vlm_request_timeout_seconds") or 120)) os.environ.setdefault("SCIREASON_LOCAL_VLM_REQUEST_TIMEOUT_SECONDS", os.environ.get("LOCAL_VLM_REQUEST_TIMEOUT_SECONDS", "300")) os.environ.setdefault("LOCAL_VLM_STARTUP_TIMEOUT_SECONDS", str(CFG.get("local_vlm_startup_timeout_seconds") or 900)) os.environ.setdefault("SCIREASON_LOCAL_VLM_STARTUP_TIMEOUT_SECONDS", os.environ.get("LOCAL_VLM_STARTUP_TIMEOUT_SECONDS", "900")) os.environ.setdefault("PDF_RENDER_DPI", str(CFG.get("pdf_render_dpi") or 110)) os.environ.setdefault("VLM_MAX_PIXELS", str(CFG.get("vlm_max_pixels") or (768 * 28 * 28))) os.environ.setdefault("VLM_MAX_NEW_TOKENS", str(CFG.get("vlm_max_new_tokens") or 192)) os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1") os.environ.setdefault("TOKENIZERS_PARALLELISM", "false") WORKSPACE_DIR.mkdir(parents=True, exist_ok=True) OUTPUT_DIR.mkdir(parents=True, exist_ok=True) ``` -------------------------------- ### Install Task3 Dependencies Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_offline.ipynb Installs or verifies the installation of required Python packages for the task3 runtime. It handles both online and offline scenarios, including optional dependency installation. ```python _required_boot_modules = ['ipywidgets', 'yaml', 'requests', 'nbformat'] _missing_boot_modules = [name for name in _required_boot_modules if not _task3_module_available(name)] if _missing_boot_modules and not TASK3_RUNTIME_OFFLINE: run_optional([sys.executable, '-m', 'pip', 'install', '-q', 'ipywidgets', 'pyyaml', 'requests', 'unidecode', 'nbformat']) elif _missing_boot_modules: print('[warn] offline mode: пропускаю pip install для базовых notebook-зависимостей ->', ', '.join(_missing_boot_modules)) ``` ```python _task3_editable_install = [sys.executable, '-m', 'pip', 'install', '-q', '-e', '.[task3]'] if TASK3_RUNTIME_OFFLINE or os.environ.get('TASK3_NOTEBOOK_SMOKE') == '1' or os.environ.get('TASK3_DUAL_NOTEBOOK_SMOKE') == '1' or os.environ.get('TASK3_NOTEBOOK_SKIP_OPTIONAL_DEPS') == '1': _task3_editable_install = [sys.executable, '-m', 'pip', 'install', '-q', '-e', '.', '--no-deps'] if not run_optional(_task3_editable_install, cwd=REPO_DIR, label='pip install editable task3 notebook runtime'): if _task3_editable_install[-1:] != ['--no-deps']: run_optional([sys.executable, '-m', 'pip', 'install', '-q', '-e', '.', '--no-deps'], cwd=REPO_DIR, label='pip install editable task3 notebook runtime --no-deps fallback') ``` -------------------------------- ### Launch Full Experiment (Recommended) Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/TUTORIAL_QWEN3VL_DATASPHERE_BUDGET_RU.md Launches the full Qwen3VL experiment using the managed launcher with a specified project ID. Requires setting the DATASPHERE_PROJECT_ID environment variable. ```bash export DATASPHERE_PROJECT_ID='' bash experiments/vlm_finetuning/datasphere/launch_examples.sh hf-full-managed ``` -------------------------------- ### Install Python Dependencies Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/third_party/top-papers-bot/README.md Installs all necessary Python libraries for the bot. Ensure you have Python 3.8+ and pip installed. ```bash pip install -r requirements.txt ``` -------------------------------- ### Install Project Dependencies Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/task3_dual_local_models_blind_ab.ipynb Installs necessary project dependencies using pip. This includes optional packages and an editable install for the task-specific components. ```python run_optional([sys.executable, '-m', 'pip', 'install', '-q', 'ipywidgets', 'pyyaml', 'requests', 'unidecode', 'nbformat']) _task3_editable_install = [sys.executable, '-m', 'pip', 'install', '-q', '-e', '.[task3]'] ``` -------------------------------- ### Initialize Repository Directory Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_colab.ipynb Sets up the repository directory, cloning it from GitHub if it doesn't exist. It prioritizes the directory found by `find_repo_root` or defaults to a path within /content or the current directory. ```python REPO_DIR = find_repo_root() REPO_URL = 'https://github.com/top-papers/top-papers-graph.git' if REPO_DIR is None: REPO_DIR = Path('/content/top-papers-graph') if Path('/content').exists() else Path.cwd() / 'top-papers-graph' if not REPO_DIR.exists(): run(['git', 'clone', '--depth', '1', REPO_URL, str(REPO_DIR)]) SRC_DIR = REPO_DIR / 'src' ``` -------------------------------- ### Display UI for Quick Setup Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/task3_dual_local_models_blind_ab_guarded_kaggle_offline.ipynb Renders the UI elements for a quick setup, including an accordion for input data, expert settings, flags, and model configurations, along with apply and reset buttons. ```python Result: VBox(children=(HTML(value='
Быстрый сетап до у… ``` -------------------------------- ### Install Task 3 Dependencies Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_multimodal_temporal_hypothesis_generation_colab.ipynb Install the necessary packages for Task 3 using pip. This command installs the package in editable mode. ```bash pip install -e ".[task3]" ``` -------------------------------- ### Environment Setup for Repository Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_gdown_script_vlm_timeout_fixed.ipynb Initializes the repository source, extracts archives if necessary, and sets up the Python path to include the repository's source and root directories. ```python kind, source_path = _ensure_repo_source() repo_extract_dir = WORKSPACE_DIR / "repo" if repo_extract_dir.exists(): shutil.rmtree(repo_extract_dir) repo_extract_dir.mkdir(parents=True, exist_ok=True) if kind == "archive": print(f"[repo] extract archive: {source_path}") with zipfile.ZipFile(source_path, "r") as zf: zf.extractall(repo_extract_dir) repo_candidates = [p for p in repo_extract_dir.rglob("*") if p.is_dir() and (p / "pyproject.toml").exists() and (p / "src" / "scireason").exists()] if not repo_candidates: raise FileNotFoundError("После распаковки архива не найден корень репозитория") REPO_ROOT = sorted(repo_candidates, key=lambda p: len(str(p)))[0] else: REPO_ROOT = source_path SRC_DIR = REPO_ROOT / "src" os.environ["PYTHONPATH"] = os.pathsep.join([str(SRC_DIR), str(REPO_ROOT), str(os.environ.get("PYTHONPATH") or "")]).strip(os.pathsep) if str(SRC_DIR) not in sys.path: sys.path.insert(0, str(SRC_DIR)) if str(REPO_ROOT) not in sys.path: sys.path.insert(0, str(REPO_ROOT)) print("[repo] root =", REPO_ROOT) ``` -------------------------------- ### Install GNN Dependencies Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/course/weeks/week11.md Install the necessary dependencies for the GNN mode. This command installs the project in editable mode with optional GNN extras. ```bash pip install -e ".[gnn]" (опционально .[gnn_ext]) ``` -------------------------------- ### Refresh Setup From Globals Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_colab.ipynb Refreshes the global effective setup and parse error variables. Call this function when the setup configuration might have changed. ```python def _refresh_setup_from_globals(): global TASK3_DUAL_EFFECTIVE_SETUP, TASK3_DUAL_SETUP_PARSE_ERROR TASK3_DUAL_EFFECTIVE_SETUP, TASK3_DUAL_SETUP_PARSE_ERROR = _build_task3_dual_setup() return TASK3_DUAL_EFFECTIVE_SETUP ``` -------------------------------- ### Repository Setup Script Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_gdown_script.ipynb This script sets up the project repository by locating and extracting it. It handles finding the repository from archives or directories and creates necessary workspace and output directories. Ensure the repository is accessible via specified paths or Kaggle datasets. ```python import json import os import shutil import subprocess import sys import zipfile from pathlib import Path WORKSPACE_DIR = Path(CFG["workspace_dir"]).expanduser() OUTPUT_DIR = Path(CFG["output_dir"]).expanduser() WORKSPACE_DIR.mkdir(parents=True, exist_ok=True) OUTPUT_DIR.mkdir(parents=True, exist_ok=True) def _find_repo_source(): explicit_archive = str(CFG.get("repo_archive_path") or "").strip() if explicit_archive: p = Path(explicit_archive) if p.exists(): return ("archive", p) explicit_dir = str(CFG.get("repo_dir_path") or "").strip() if explicit_dir: p = Path(explicit_dir) if (p / "pyproject.toml").exists() and (p / "src" / "scireason").exists(): return ("dir", p) search_roots = [Path("/kaggle/input"), Path("/kaggle/working"), Path.cwd(), Path("/mnt/data")] patterns = ("top-papers-graph*.zip", "top-papers-graph-main*", "top-papers-graph*") for base in search_roots: if not base.exists(): continue for pattern in patterns: for candidate in sorted(base.rglob(pattern)): if candidate.is_file() and candidate.suffix == ".zip": return ("archive", candidate) if candidate.is_dir() and (candidate / "pyproject.toml").exists() and (candidate / "src" / "scireason").exists(): return ("dir", candidate) raise FileNotFoundError("Не удалось найти архив/папку top-papers-graph. Прикрепите dataset или заполните repo_archive_path / repo_dir_path.") kind, source_path = _find_repo_source() repo_extract_dir = WORKSPACE_DIR / "repo" if repo_extract_dir.exists(): shutil.rmtree(repo_extract_dir) repo_extract_dir.mkdir(parents=True, exist_ok=True) if kind == "archive": print(f"[repo] extract archive: {source_path}") with zipfile.ZipFile(source_path, "r") as zf: zf.extractall(repo_extract_dir) repo_candidates = [p for p in repo_extract_dir.rglob("*") if p.is_dir() and (p / "pyproject.toml").exists() and (p / "src" / "scireason").exists()] if not repo_candidates: raise FileNotFoundError("После распаковки архива не найден корень репозитория") REPO_ROOT = sorted(repo_candidates, key=lambda p: len(str(p)))[0] else: REPO_ROOT = source_path print("[repo] root =", REPO_ROOT) ``` -------------------------------- ### Install Base Notebook Dependencies Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/task3_dual_local_models_blind_ab_guarded_fixed.ipynb Installs essential Python packages required for the notebook runtime if they are missing. It conditionally skips the installation in offline mode. ```python _required_boot_modules = ['ipywidgets', 'yaml', 'requests', 'nbformat'] _missing_boot_modules = [name for name in _required_boot_modules if not _task3_module_available(name)] if _missing_boot_modules and not TASK3_RUNTIME_OFFLINE: run_optional([sys.executable, '-m', 'pip', 'install', '-q', 'ipywidgets', 'pyyaml', 'requests', 'unidecode', 'nbformat']) elif _missing_boot_modules: print('[warn] offline mode: пропускаю pip install для базовых notebook-зависимостей ->', ', '.join(_missing_boot_modules)) ``` -------------------------------- ### Apply Quick Setup Configuration Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_offline.ipynb Applies the current settings from the quick setup UI to global variables. It also checks for initial YAML override errors and displays a warning if any are found. ```Python _task3_quick_apply_to_globals(show_message=False) if TASK3_QUICK_INITIAL_YAML_ERROR: quick_status_html.value = _task3_quick_render_box( 'Есть проблема в сохранённом YAML override', lines=[TASK3_QUICK_INITIAL_YAML_ERROR], tone='warning', ) ``` -------------------------------- ### Build Initial Setup for Task 3 Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_offline.ipynb Constructs the initial setup configuration for Task 3 by merging default settings with configurations from environment variables, JSON, and YAML files. Handles potential parsing errors for YAML. ```python def _task3_quick_build_initial_setup(): merged = _task3_quick_default_setup() raw_setup = globals().get('TASK3_DUAL_SETUP') if isinstance(raw_setup, dict): merged = _task3_quick_deep_update(merged, raw_setup) json_text = str(globals().get('TASK3_DUAL_SETUP_JSON') or os.environ.get('TASK3_DUAL_SETUP_JSON') or '').strip() if json_text: try: payload = json.loads(json_text) if isinstance(payload, dict): merged = _task3_quick_deep_update(merged, payload) except Exception: pass setup_path = str(globals().get('TASK3_DUAL_SETUP_PATH') or os.environ.get('TASK3_DUAL_SETUP_PATH') or '').strip() if setup_path: setup_file = Path(setup_path).expanduser() if setup_file.exists(): try: payload = yaml.safe_load(setup_file.read_text(encoding='utf-8')) or {} if isinstance(payload, dict): merged = _task3_quick_deep_update(merged, payload) except Exception: pass yaml_text = str(globals().get('TASK3_DUAL_SETUP_YAML') or '').strip() yaml_payload, yaml_error = _task3_quick_parse_yaml(yaml_text) if yaml_payload: merged = _task3_quick_deep_update(merged, yaml_payload) return merged, yaml_text, yaml_error TASK3_QUICK_INITIAL_SETUP, TASK3_QUICK_INITIAL_YAML_TEXT, TASK3_QUICK_INITIAL_YAML_ERROR = _task3_quick_build_initial_setup() ``` -------------------------------- ### Install and Import Libraries for Task 3 Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/task3_multimodal_temporal_hypothesis_generation.ipynb Installs the task3 package in editable mode and imports core libraries. This cell should be run first. It conditionally installs based on environment variables. ```python if os.environ.get('TASK3_NOTEBOOK_SMOKE') == '1' or os.environ.get('TASK3_DUAL_NOTEBOOK_SMOKE') == '1' or os.environ.get('TASK3_NOTEBOOK_SKIP_OPTIONAL_DEPS') == '1': _task3_editable_install = [sys.executable, '-m', 'pip', 'install', '-q', '-e', '.', '--no-deps'] run_optional(_task3_editable_install, cwd=REPO_DIR, label='pip install editable task3 notebook runtime') import yaml import ipywidgets as W from IPython.display import HTML, Markdown, display, clear_output from unidecode import unidecode from scireason.config import settings from scireason.llm import list_available_g4f_models from scireason.pipeline.task3_hypothesis_generation import prepare_task3_hypothesis_bundle from scireason.task3_offline_review import ( build_task3_expert_artifact_bundle, build_task3_offline_review_package, ) TASK3_DEFAULT_G4F_MODEL = getattr(settings, 'task2_default_g4f_model', 'auto') or 'auto' TASK3_DEFAULT_LOCAL_VLM_MODEL = getattr(settings, 'vlm_model_id', 'Qwen/Qwen2.5-VL-3B-Instruct') or 'Qwen/Qwen2.5-VL-3B-Instruct' TASK3_LAST_RUN = None TASK3_LAST_TASK_META = None TASK3_LAST_ARTIFACTS = None print('REPO_DIR =', REPO_DIR) print('SRC_DIR =', SRC_DIR) print('task3 default g4f model =', TASK3_DEFAULT_G4F_MODEL) print('task3 default local VLM model =', TASK3_DEFAULT_LOCAL_VLM_MODEL) ``` -------------------------------- ### Environment Setup and Path Initialization Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/notebooks/task3_dual_local_models_blind_ab_kaggle_gdown_script_fixed.ipynb Initializes environment variables and directories for Kaggle operations, including Hugging Face cache, VLM timeouts, and DPI settings. Creates necessary workspace and output directories. ```python import importlib import json import os import shutil import subprocess import sys import zipfile from pathlib import Path WORKSPACE_DIR = Path(CFG["workspace_dir"]).expanduser() OUTPUT_DIR = Path(CFG["output_dir"]).expanduser() HF_HOME = Path(str(CFG.get("hf_home") or "/kaggle/working/.hf")).expanduser() HF_HOME.mkdir(parents=True, exist_ok=True) os.environ.setdefault("HF_HOME", str(HF_HOME)) os.environ.setdefault("HF_HUB_CACHE", str(HF_HOME / "hub")) os.environ.setdefault("TRANSFORMERS_CACHE", str(HF_HOME / "hub")) os.environ.setdefault("LOCAL_VLM_REQUEST_TIMEOUT_SECONDS", str(CFG.get("local_vlm_request_timeout_seconds") or 120)) os.environ.setdefault("SCIREASON_LOCAL_VLM_REQUEST_TIMEOUT_SECONDS", os.environ.get("LOCAL_VLM_REQUEST_TIMEOUT_SECONDS", "120")) os.environ.setdefault("PDF_RENDER_DPI", str(CFG.get("pdf_render_dpi") or 110)) os.environ.setdefault("VLM_MAX_PIXELS", str(CFG.get("vlm_max_pixels") or (1024 * 28 * 28))) os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1") os.environ.setdefault("TOKENIZERS_PARALLELISM", "false") WORKSPACE_DIR.mkdir(parents=True, exist_ok=True) OUTPUT_DIR.mkdir(parents=True, exist_ok=True) ``` -------------------------------- ### Editable Install Task 3 Package Source: https://github.com/visualcomments/top-papers-graph.git/blob/main/task3_dual_local_models_blind_ab_guarded_fixed.ipynb Installs the Task 3 package in editable mode. It includes options to skip dependencies or install without them based on runtime environment variables. ```python _task3_editable_install = [sys.executable, '-m', 'pip', 'install', '-q', '-e', '.[task3]'] if TASK3_RUNTIME_OFFLINE or os.environ.get('TASK3_NOTEBOOK_SMOKE') == '1' or os.environ.get('TASK3_DUAL_NOTEBOOK_SMOKE') == '1' or os.environ.get('TASK3_NOTEBOOK_SKIP_OPTIONAL_DEPS') == '1': _task3_editable_install = [sys.executable, '-m', 'pip', 'install', '-q', '-e', '.', '--no-deps'] if not run_optional(_task3_editable_install, cwd=REPO_DIR, label='pip install editable task3 notebook runtime'): run_optional([sys.executable, '-m', 'pip', 'install', '-q', '-e', '.', '--no-deps'], cwd=REPO_DIR, label='pip install editable task3 notebook runtime --no-deps fallback') ```