!pip install pandas networkx numpy matplotlib plotly

Collecting pandas
  Using cached pandas-3.0.2-cp313-cp313-win_amd64.whl.metadata (19 kB)
Collecting networkx
  Using cached networkx-3.6.1-py3-none-any.whl.metadata (6.8 kB)
Collecting numpy
  Using cached numpy-2.4.4-cp313-cp313-win_amd64.whl.metadata (6.6 kB)
Collecting matplotlib
  Using cached matplotlib-3.10.8-cp313-cp313-win_amd64.whl.metadata (52 kB)
Collecting plotly
  Downloading plotly-6.7.0-py3-none-any.whl.metadata (8.6 kB)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: tzdata in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from pandas) (2026.1)
Collecting contourpy>=1.0.1 (from matplotlib)
  Using cached contourpy-1.3.3-cp313-cp313-win_amd64.whl.metadata (5.5 kB)
Collecting cycler>=0.10 (from matplotlib)
  Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
  Using cached fonttools-4.62.1-cp313-cp313-win_amd64.whl.metadata (119 kB)
Collecting kiwisolver>=1.3.1 (from matplotlib)
  Using cached kiwisolver-1.5.0-cp313-cp313-win_amd64.whl.metadata (5.2 kB)
Requirement already satisfied: packaging>=20.0 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from matplotlib) (26.0)
Collecting pillow>=8 (from matplotlib)
  Using cached pillow-12.2.0-cp313-cp313-win_amd64.whl.metadata (9.0 kB)
Collecting pyparsing>=3 (from matplotlib)
  Using cached pyparsing-3.3.2-py3-none-any.whl.metadata (5.8 kB)
Collecting narwhals>=1.15.1 (from plotly)
  Using cached narwhals-2.19.0-py3-none-any.whl.metadata (14 kB)
Requirement already satisfied: six>=1.5 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)
Using cached pandas-3.0.2-cp313-cp313-win_amd64.whl (9.7 MB)
Using cached networkx-3.6.1-py3-none-any.whl (2.1 MB)
Using cached numpy-2.4.4-cp313-cp313-win_amd64.whl (12.3 MB)
Using cached matplotlib-3.10.8-cp313-cp313-win_amd64.whl (8.1 MB)
Downloading plotly-6.7.0-py3-none-any.whl (9.9 MB)
   ---------------------------------------- 0.0/9.9 MB ? eta -:--:--
   -------------- ------------------------- 3.7/9.9 MB 20.5 MB/s eta 0:00:01
   --------------------------------- ------ 8.4/9.9 MB 20.9 MB/s eta 0:00:01
   ---------------------------------------  9.7/9.9 MB 17.8 MB/s eta 0:00:01
   ---------------------------------------- 9.9/9.9 MB 14.1 MB/s eta 0:00:00
Using cached contourpy-1.3.3-cp313-cp313-win_amd64.whl (226 kB)
Using cached cycler-0.12.1-py3-none-any.whl (8.3 kB)
Using cached fonttools-4.62.1-cp313-cp313-win_amd64.whl (2.3 MB)
Using cached kiwisolver-1.5.0-cp313-cp313-win_amd64.whl (73 kB)
Using cached narwhals-2.19.0-py3-none-any.whl (446 kB)
Using cached pillow-12.2.0-cp313-cp313-win_amd64.whl (7.1 MB)
Using cached pyparsing-3.3.2-py3-none-any.whl (122 kB)
Installing collected packages: pyparsing, pillow, numpy, networkx, narwhals, kiwisolver, fonttools, cycler, plotly, pandas, contourpy, matplotlib
Successfully installed contourpy-1.3.3 cycler-0.12.1 fonttools-4.62.1 kiwisolver-1.5.0 matplotlib-3.10.8 narwhals-2.19.0 networkx-3.6.1 numpy-2.4.4 pandas-3.0.2 pillow-12.2.0 plotly-6.7.0 pyparsing-3.3.2

[notice] A new release of pip is available: 25.0.1 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip

!pip install seaborn scikit-learn

Collecting seaborn
  Using cached seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
Collecting scikit-learn
  Using cached scikit_learn-1.8.0-cp313-cp313-win_amd64.whl.metadata (11 kB)
Requirement already satisfied: numpy!=1.24.0,>=1.20 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from seaborn) (2.4.4)
Requirement already satisfied: pandas>=1.2 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from seaborn) (3.0.2)
Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from seaborn) (3.10.8)
Collecting scipy>=1.10.0 (from scikit-learn)
  Using cached scipy-1.17.1-cp313-cp313-win_amd64.whl.metadata (60 kB)
Collecting joblib>=1.3.0 (from scikit-learn)
  Using cached joblib-1.5.3-py3-none-any.whl.metadata (5.5 kB)
Collecting threadpoolctl>=3.2.0 (from scikit-learn)
  Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.3.3)
Requirement already satisfied: cycler>=0.10 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.62.1)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.5.0)
Requirement already satisfied: packaging>=20.0 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (26.0)
Requirement already satisfied: pillow>=8 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (12.2.0)
Requirement already satisfied: pyparsing>=3 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.3.2)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0)
Requirement already satisfied: tzdata in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from pandas>=1.2->seaborn) (2026.1)
Requirement already satisfied: six>=1.5 in c:\users\rdelavega\desktop\slovenia 13ab2026\venv\lib\site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.17.0)
Using cached seaborn-0.13.2-py3-none-any.whl (294 kB)
Using cached scikit_learn-1.8.0-cp313-cp313-win_amd64.whl (8.0 MB)
Using cached joblib-1.5.3-py3-none-any.whl (309 kB)
Using cached scipy-1.17.1-cp313-cp313-win_amd64.whl (36.5 MB)
Using cached threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, scipy, joblib, scikit-learn, seaborn
Successfully installed joblib-1.5.3 scikit-learn-1.8.0 scipy-1.17.1 seaborn-0.13.2 threadpoolctl-3.6.0

[notice] A new release of pip is available: 25.0.1 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip

# Celda 1: Importaciones y configuración 
# del entorno de análisis de sistemas complejos

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyBboxPatch
import seaborn as sns
from datetime import datetime
import re
from collections import Counter, defaultdict
from sklearn.preprocessing import MinMaxScaler
from sklearn.feature_extraction.text import CountVectorizer
import warnings
warnings.filterwarnings('ignore')

# Configuración para visualizaciones de sistemas complejos
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 11
plt.rcParams['lines.linewidth'] = 1.5
sns.set_style("whitegrid")
sns.set_palette("viridis")

print("Entorno configurado para análisis de co-evolución de sistemas complejos adaptativos")
print("Librerías cargadas correctamente")

# Celda 2: Ingesta de los dos datasets y unificación en una estructura temporal única
# Carga de los archivos CSV
# 265 registros sobre educación profesional
df_education = pd.read_csv('A.csv') 
# 780 registros sobre desarrollo económico
df_economy = pd.read_csv('C.csv')    

# Verificación de la estructura temporal
print(f"Dataset de Educación: {len(df_education)} registros")
print(f"  Rango de fechas: {df_education['publication year'].min()} - {df_education['publication year'].max()}")
print(f"Dataset de Economía: {len(df_economy)} registros")
print(f"  Rango de fechas: {df_economy['publication year'].min()} - {df_economy['publication year'].max()}")

# Unificación en un solo dataframe con una columna de origen
df_education['dataset_source'] = 'education_etp'
df_economy['dataset_source'] = 'economy'

df_combined = pd.concat([df_education, df_economy], ignore_index=True)
df_combined['publication year'] = df_combined['publication year'].astype(int)

# Verificación de la cobertura temporal unificada
print(f"\nDataset combinado: {len(df_combined)} registros")
print(f"Cobertura temporal completa: {df_combined['publication year'].min()} - {df_combined['publication year'].max()}")

# Mostrar distribución por año
year_distribution = df_combined.groupby(['publication year', 'dataset_source']).size().unstack(fill_value=0)
print("\nDistribución anual por fuente:")
print(year_distribution.tail(22))

df_combined.head(3)

Dataset de Educación: 265 registros
  Rango de fechas: 2006 - 2026
Dataset de Economía: 780 registros
  Rango de fechas: 2006 - 2026

Dataset combinado: 1045 registros
Cobertura temporal completa: 2006 - 2026

Distribución anual por fuente:
dataset_source    economy  education_etp
publication year                        
2006                    9              1
2007                   21              2
2008                   23              5
2009                   31             15
2010                   22              6
2011                   30              5
2012                   27             14
2013                   36              6
2014                   41             11
2015                   38              7
2016                   41             11
2017                   37             20
2018                   39             12
2019                   53             16
2020                   51             26
2021                   59             27
2022                   51             12
2023                   51             18
2024                   45             24
2025                   61             21
2026                   14              6

# Celda 5: Construcción de Variables de Estado del Sistema 
# Variable 1: Grado de Apertura del Sistema de Aprendizaje 
# Técnico-Profesional (GA)
# Mide la proporción de interacciones del S-ATP orientadas 
# al contexto empresarial vs. educativo
df_combined['GA'] = df_combined['score_context'] / (df_combined['score_context'] + 0.1)
# Variable 2: Heterogeneidad del Sistema Económico (HH)
# Se calcula como la diversidad de objetos económicos 
# mencionados en artículos de economía
economic_mask = df_combined['dataset_source'] == 'economy'
df_combined['HH'] = np.nan
df_combined.loc[economic_mask, 'HH'] = df_combined.loc[economic_mask, 'score_object'].rolling(5, min_periods=1).std()
# Variable 3: Velocidad de Aprendizaje del Sistema (VAS)
# Proxy: proporción de artículos que usan tecnología 
# predictiva/compleja vs. descriptiva
high_tech_keywords = ['machine learning', 'neural network', 'deep learning', 'simulation', 
                      'modeling', 'optimization', 'forecasting', 'prediction', 'genetic',
                      'bayesian', 'complex', 'nonlinear', 'chaos', 'agent-based', 'nlp',
                      'big data', 'analytics', 'data mining', 'artificial intelligence']

def is_high_tech(text):
    """Determina si un artículo utiliza tecnología predictiva o de simulación"""
    if pd.isna(text) or text == '':
        return 0
    text_lower = text.lower()
    return 1 if any(kw in text_lower for kw in high_tech_keywords) else 0

df_combined['is_high_tech'] = df_combined['technology'].apply(lambda x: is_high_tech(str(x)) if pd.notna(x) else 0)

# VAS como proporción de alta tecnología agrupada por año
yearly_high_tech = df_combined.groupby('publication year')['is_high_tech'].mean().to_dict()
df_combined['VAS'] = df_combined['publication year'].map(yearly_high_tech).fillna(0)
# Variable 4: Nivel de Co-Especialización (NCE)
# Mide la frecuencia con que acción educativa y propósito 
# económico aparecen juntos
df_combined['co_specialization'] = df_combined['score_action'] * df_combined['score_purpose']
# Calcular el promedio por año
yearly_nce = df_combined.groupby('publication year')['co_specialization'].mean().to_dict()
df_combined['NCE'] = df_combined['publication year'].map(yearly_nce).fillna(0)
# Agregación por año para visualización
yearly_state = df_combined.groupby('publication year').agg({
    'GA': 'mean',
    'HH': 'mean',
    'VAS': 'mean',
    'NCE': 'mean'
}).reset_index()
# Llenar valores faltantes en HH usando ffill() como 
# método (sin argumento method)
# CORRECCIÓN PRINCIPAL: usar .ffill() en lugar de fillna(method='ffill')
yearly_state['HH'] = yearly_state['HH'].ffill().fillna(0)

print("Variables de estado construidas correctamente")
print(f"Rango de años: {yearly_state['publication year'].min()} - {yearly_state['publication year'].max()}")
print(f"Total de años con datos: {len(yearly_state)}")
print("\nPrimeros 10 años de la serie temporal:")
print(yearly_state.head(10))
print("\nEstadísticas descriptivas de las variables de estado:")
print(yearly_state[['GA', 'HH', 'VAS', 'NCE']].describe())
# Verificación adicional de valores nulos
print(f"\nValores nulos restantes: GA={yearly_state['GA'].isna().sum()}, HH={yearly_state['HH'].isna().sum()}, VAS={yearly_state['VAS'].isna().sum()}, NCE={yearly_state['NCE'].isna().sum()}")

Variables de estado construidas correctamente
Rango de años: 2006 - 2026
Total de años con datos: 21

Primeros 10 años de la serie temporal:
   publication year        GA        HH       VAS       NCE
0              2006  0.356443  0.045362  0.000000  0.003403
1              2007  0.270568  0.057499  0.086957  0.004801
2              2008  0.254619  0.044154  0.035714  0.002827
3              2009  0.247604  0.034841  0.021739  0.003684
4              2010  0.286960  0.055863  0.000000  0.003770
5              2011  0.303691  0.038946  0.000000  0.003234
6              2012  0.267429  0.052485  0.048780  0.003354
7              2013  0.344688  0.043104  0.095238  0.004630
8              2014  0.279800  0.044561  0.019231  0.005128
9              2015  0.294190  0.053912  0.066667  0.005710

Estadísticas descriptivas de las variables de estado:
              GA         HH        VAS        NCE
count  21.000000  21.000000  21.000000  21.000000
mean    0.284803   0.050814   0.048412   0.005769
std     0.032311   0.007357   0.044184   0.002609
min     0.236443   0.034841   0.000000   0.002827
25%     0.269467   0.045362   0.019231   0.003770
50%     0.278000   0.052485   0.043478   0.005128
75%     0.294190   0.055449   0.060976   0.006080
max     0.356443   0.063137   0.200000   0.012279

Valores nulos restantes: GA=0, HH=0, VAS=0, NCE=0

# Celda 6: Visualización y Tablas de Síntesis de la Co-Evolución {S-ATP, SE}
import plotly.express as px
import plotly.graph_objects as go
from sklearn.preprocessing import MinMaxScaler

# 1. CONFIGURACIÓN DE ESTILO Y FASES
if 'yearly_state' not in locals():
    print("ERROR: 'yearly_state' no está definido.")
else:
    # Paleta profesional de alta visibilidad
    phase_definitions = {
        'Fase 1: Pre-UE': (2001, 2004),
        'Fase 2: Shock & Crisis': (2005, 2009),
        'Fase 3: Reconfiguración': (2010, 2015),
        'Fase 4: Maduración Digital': (2016, 2026)
    }
    
    # Colores vinculados a la semántica de la fase (Frío -> Cálido -> Estable)
    phase_colors = {
        'Fase 1: Pre-UE': '#B0BEC5',      # Gris azulado (Inercia)
        'Fase 2: Shock & Crisis': '#FF8A65', # Naranja (Alerta)
        'Fase 3: Reconfiguración': '#4DB6AC', # Turquesa (Adaptación)
        'Fase 4: Maduración Digital': '#5C6BC0' # Índigo (Complejidad)
    }

    # Asignar fases para Plotly
    def get_phase_label(year):
        for phase, (start, end) in phase_definitions.items():
            if start <= year <= end: return phase
        return 'Otros'
    
    yearly_state['fase_label'] = yearly_state['publication year'].apply(get_phase_label)

    # 2. ESPACIO DE FASES INTERACTIVO (PLOTLY)
    # Reemplaza la lógica estática para permitir exploración de años específicos
    scaler = MinMaxScaler()
    yearly_state['GA_norm'] = scaler.fit_transform(yearly_state[['GA']])
    yearly_state['VAS_norm'] = scaler.fit_transform(yearly_state[['VAS']])

    fig_interact = px.scatter(
        yearly_state, x='GA_norm', y='VAS_norm',
        color='fase_label', size='NCE', 
        hover_data=['publication year', 'GA', 'VAS', 'NCE'],
        text='publication year',
        color_discrete_map=phase_colors,
        title='<b>Atractor de Co-Evolución {S-ATP, SE}</b><br><sup>Trayectoria interactiva del Espacio de Fases 2001-2026</sup>',
        labels={'GA_norm': 'Apertura (GA) Norm.', 'VAS_norm': 'Velocidad Aprendizaje (VAS) Norm.'}
    )

    # Añadir las líneas de trayectoria (quivers)
    fig_interact.add_trace(go.Scatter(
        x=yearly_state['GA_norm'], y=yearly_state['VAS_norm'],
        mode='lines', line=dict(color='rgba(100,100,100,0.2)', width=1),
        showlegend=False
    ))

    fig_interact.update_layout(
        template='plotly_white',
        legend_title_text='Fases Sistémicas',
        font=dict(family="Arial", size=12)
    )
    
    fig_interact.show()

    # 3. SERIES TEMPORALES (MATPLOTLIB OPTIMIZADO)
    plt.style.use('seaborn-v0_8-whitegrid')
    fig, axes = plt.subplots(2, 2, figsize=(15, 11), facecolor='#F8F9FA')
    vars_to_plot = [('GA', '#1A237E', 'Apertura'), ('HH', '#E65100', 'Heterogeneidad'), 
                    ('VAS', '#1B5E20', 'Velocidad'), ('NCE', '#4A148C', 'Co-Especialización')]

    for i, (var, col, title) in enumerate(vars_to_plot):
        ax = axes[i//2, i%2]
        ax.plot(yearly_state['publication year'], yearly_state[var], 'o-', 
                color=col, linewidth=2.5, markersize=7, markerfacecolor='white')
        
        # Sombreado elegante de fases
        for phase, (start, end) in phase_definitions.items():
            ax.axvspan(start-0.5, end+0.5, color=phase_colors[phase], alpha=0.15)
            
        ax.set_title(f'<b>{title} ({var})</b>', fontsize=13, pad=10)
        ax.grid(True, axis='y', alpha=0.4)
        ax.spines[['top', 'right']].set_visible(False)

    plt.tight_layout(rect=[0, 0.03, 1, 0.95])
    fig.suptitle('Análisis de Series Temporales: Variables de Estado del Sistema Diádico', fontsize=16, fontweight='bold')
    plt.show()

    # 4. TABLA DE SÍNTESIS ESTILIZADA
    print("\n" + " SUMMARY OF SYSTEMIC EVOLUTION ".center(100, "="))
    display(yearly_state[['publication year', 'GA', 'HH', 'VAS', 'NCE']]
            .round(4).style.background_gradient(cmap='Blues', subset=['VAS', 'NCE']))

================================== SUMMARY OF SYSTEMIC EVOLUTION ===================================

	id	article title	source title	author keywords	abstract	publication date	publication year	volume	issue	start page	...	early access date	ut (unique wos id)	action	object	purpose	context	technology	dataset_source
0	1	Current training for healthcare students on di...	Educational gerontology	Attitudes	With the increasing number of older adults and...	2026 mar 25	2026	NaN	NaN	NaN	...	Mar 2026	Wos:001723355200001	evaluate	curriculum content	combat ageism	healthcare education	thematic analysis	education_etp
1	2	Workplace stress and well-being in nursing: in...	Healthcare	Workplace stress; nurses; hse management stand...	Background: work-related stress represents a m...	Mar 18	2026	14.0	6	NaN	...	NaN	Wos:001725574000001	assess	workplace stress	identify factors	healthcare workforce	cross-sectional study	education_etp
2	3	Inequity in access to palliative care services...	Radiology and oncology	Palliative care; inaccessibility; opioids; bas...	Background palliative care aims to enhance the...	Mar 1	2026	60.0	1	15	...	Feb 2026	Wos:001705278900001	analyze	palliative care	reduce inequity	global health	epidemiology	education_etp

	score_action	score_object	score_purpose	score_context	score_technology
count	1045.000000	1045.000000	1045.000000	1045.000000	1045.000000
mean	0.092703	0.095920	0.055396	0.047045	0.026124
std	0.054634	0.057066	0.047369	0.037916	0.033053
min	0.000000	0.000000	0.000000	0.000000	0.000000
25%	0.050000	0.052632	0.027778	0.027027	0.000000
50%	0.075000	0.078947	0.055556	0.054054	0.025000
75%	0.125000	0.131579	0.083333	0.081081	0.050000
max	0.575000	0.342105	0.444444	0.324324	0.400000

	publication year	GA	HH	VAS	NCE
0	2006	0.356400	0.045400	0.000000	0.003400
1	2007	0.270600	0.057500	0.087000	0.004800
2	2008	0.254600	0.044200	0.035700	0.002800
3	2009	0.247600	0.034800	0.021700	0.003700
4	2010	0.287000	0.055900	0.000000	0.003800
5	2011	0.303700	0.038900	0.000000	0.003200
6	2012	0.267400	0.052500	0.048800	0.003400
7	2013	0.344700	0.043100	0.095200	0.004600
8	2014	0.279800	0.044600	0.019200	0.005100
9	2015	0.294200	0.053900	0.066700	0.005700
10	2016	0.269500	0.048600	0.019200	0.004200
11	2017	0.239600	0.054600	0.052600	0.005200
12	2018	0.321500	0.055300	0.058800	0.006000
13	2019	0.275900	0.054500	0.072500	0.012300
14	2020	0.236400	0.049600	0.013000	0.004800
15	2021	0.276500	0.048200	0.046500	0.006100
16	2022	0.278000	0.057000	0.031700	0.007100
17	2023	0.270900	0.047100	0.043500	0.006100
18	2024	0.284700	0.062900	0.043500	0.008900
19	2025	0.286700	0.055400	0.061000	0.008400
20	2026	0.335200	0.063100	0.200000	0.011600